Query Term Selection Strategies for Web-based Chinese Factoid Question Answering
نویسندگان
چکیده
Passage retrieval plays an important role in a Chinese factoid Question Answering (QA) system. Query term selection is the process of choosing keywords from a given question to make the most use of information retrieval engines. Query terms selected by humans are analyzed to measure the difficulty and for evaluating machine generated results. Three approaches, namely stop words elimination, rule-based, and machine learning-based, are studied in this paper. Eliminating stop words is the simplest one. Heuristic rules produced by morphologists are more complex. Conditional Random Fields (CRF), a machine learning approach, is adopted for labeling query terms. For evaluation, two sets of metrics are proposed. Passage MRR/Coverage relies on search engine result which directly relates to the QA performance but is time consuming and may vary at different time. Our experiment shows that Query Term Precision/Recall is a viable alternative. The baseline Coverage of sending raw questions to Google is about 53%, while applying the three approaches yields 65% for stop words elimination, 57% for rule-based approach, and 54% for machine learning-based approach. The MRR of sending raw questions to Google is 0.33, while applying the three approaches yields 0.44 for stop words elimination, 0.41 for rule-based approach and 0.38 for machine learning-based approach. The result can be not only for factoid QA systems but also a preprocessor for search engines.
منابع مشابه
Boosting Passage Retrieval through Reuse in Question Answering
Question Answering (QA) is an emerging important field in Information Retrieval. In a QA system the archive of previous questions asked from the system makes a collection full of useful factual nuggets. This paper makes an initial attempt to investigate the reuse of facts contained in the archive of previous questions to help and gain performance in answering future related factoid questions. I...
متن کاملMonolingual Web-Based Factoid Question Answering In Chinese,Swedish, English And Japanese
In this paper we extend the application of our statistical pattern classification approach to question answering (QA) which has previously been applied successfully to English and Japanese to develop two prototype QA systems in Chinese and Swedish. We show what data is necessary to achieve this and also evaluate the performance of the two new systems using a translation of the TREC 2003 factoid...
متن کاملOpen-Domain Non-factoid Question Answering
We present an end-to-end system for open-domain non-factoid question answering. We leverage the information on the ever-growing World Wide Web, and the capabilities of modern search engines to find the relevant information. Our QA system is composed of three components: (i) query formulation module (QFM) (ii) candidate answer generation module (CAGM) and (iii) answer selection module (ASM). A t...
متن کاملComponent Analysis of a Chinese Factoid Question-Answering System
An analysis is provided for three major components of a simple Chinese Question-Answering system: passage retrieval, entity extraction and candidate selection. The order of least effective component is determined to be: answer selection, retrieval and extraction. In crosslingual QA, deficiencies in question translation not only lead to retrieval loss, but may also have adverse effects at answer...
متن کاملAnswering the Hard Questions
We present an end-to-end system for open-domain non-factoid question-answering. To accomplish this we leverage the information on the ever-growing World Wide Web, and the capabilities of commercial search engines to find the relevant information. Our QA system is composed of three components: (i) query formulation module (QFM) (ii) candidate answer generation module (CAGM) and (iii) answer sele...
متن کامل